809 research outputs found
Decoding billions of integers per second through vectorization
In many important applications -- such as search engines and relational
database systems -- data is stored in the form of arrays of integers. Encoding
and, most importantly, decoding of these arrays consumes considerable CPU time.
Therefore, substantial effort has been made to reduce costs associated with
compression and decompression. In particular, researchers have exploited the
superscalar nature of modern processors and SIMD instructions. Nevertheless, we
introduce a novel vectorized scheme called SIMD-BP128 that improves over
previously proposed vectorized approaches. It is nearly twice as fast as the
previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the
same time, SIMD-BP128 saves up to 2 bits per integer. For even better
compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has
a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while
being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see
http://boytsov.info/datasets/clueweb09gap
Fast Hands-free Writing by Gaze Direction
We describe a method for text entry based on inverse arithmetic coding that
relies on gaze direction and which is faster and more accurate than using an
on-screen keyboard.
These benefits are derived from two innovations: the writing task is matched
to the capabilities of the eye, and a language model is used to make
predictable words and phrases easier to write.Comment: 3 pages. Final versio
Investigating five key predictive text entry with combined distance and keystroke modelling
This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users
Towards an automated classification of spreadsheets
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work
Learning Aligned-Spatial Graph Convolutional Networks for Graph Classification
In this paper, we develop a novel Aligned-Spatial Graph Convolutional Network (ASGCN) model to learn effective features for graph classification. Our idea is to transform arbitrary-sized graphs into fixed-sized aligned grid structures, and define a new spatial graph convolution operation associated with the grid structures. We show that the proposed ASGCN model not only reduces the problems of information loss and imprecise information representation arising in existing spatially-based Graph Convolutional Network (GCN) models, but also bridges the theoretical gap between traditional Convolutional Neural Network (CNN) models and spatially-based GCN models. Moreover, the proposed ASGCN model can adaptively discriminate the importance between specified vertices during the process of spatial graph convolution, explaining the effectiveness of the proposed model. Experiments on standard graph datasets demonstrate the effectiveness of the proposed model
Identifying Critical States by the Action-Based Variance of Expected Return
The balance of exploration and exploitation plays a crucial role in
accelerating reinforcement learning (RL). To deploy an RL agent in human
society, its explainability is also essential. However, basic RL approaches
have difficulties in deciding when to choose exploitation as well as in
extracting useful points for a brief explanation of its operation. One reason
for the difficulties is that these approaches treat all states the same way.
Here, we show that identifying critical states and treating them specially is
commonly beneficial to both problems. These critical states are the states at
which the action selection changes the potential of success and failure
substantially. We propose to identify the critical states using the variance in
the Q-function for the actions and to perform exploitation with high
probability on the identified states. These simple methods accelerate RL in a
grid world with cliffs and two baseline tasks of deep RL. Our results also
demonstrate that the identified critical states are intuitively interpretable
regarding the crucial nature of the action selection. Furthermore, our analysis
of the relationship between the timing of the identification of especially
critical states and the rapid progress of learning suggests there are a few
especially critical states that have important information for accelerating RL
rapidly.Comment: 12 pages, 6 figure
Performing Feature Selection with ACO
Summary. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In real world problems FS is a must due to the abundance of noisy, irrelevant or misleading features. However, current methods are inadequate at finding optimal reductions. This chapter presents a feature selection mechanism based on Ant Colony Optimization in an attempt to combat this. The method is then applied to the problem of finding optimal feature subsets in the fuzzy-rough data reduction process. The present work is applied to two very different challenging tasks, namely web classification and complex systems monitoring.
Bekenstein entropy bound for weakly-coupled field theories on a 3-sphere
We calculate the high temperature partition functions for SU(Nc) or U(Nc)
gauge theories in the deconfined phase on S^1 x S^3, with scalars, vectors,
and/or fermions in an arbitrary representation, at zero 't Hooft coupling and
large Nc, using analytical methods. We compare these with numerical results
which are also valid in the low temperature limit and show that the Bekenstein
entropy bound resulting from the partition functions for theories with any
amount of massless scalar, fermionic, and/or vector matter is always satisfied
when the zero-point contribution is included, while the theory is sufficiently
far from a phase transition. We further consider the effect of adding massive
scalar or fermionic matter and show that the Bekenstein bound is satisfied when
the Casimir energy is regularized under the constraint that it vanishes in the
large mass limit. These calculations can be generalized straightforwardly for
the case of a different number of spatial dimensions.Comment: 32 pages, 12 figures. v2: Clarifications added. JHEP versio
Investigating the effectiveness of client-side search/browse without a network connection
Search and browse, incorporating elements of information retrieval and database operations, are core services in most digital repository toolkits. These are often implemented using a server-side index, such as that produced by Apache SOLR. However, sometimes a small collection needs to be static and portable, or stored client-side. It is proposed that, in these instances, browser-based search and browse is possible, using standard facilities within the browser. This was implemented and evaluated for varying behaviours and collection sizes. The results show that it was possible to achieve fast performance for typical queries on small- to medium-sized collections
Using deep learning for ordinal classification of mobile marketing user conversion
In this paper, we explore Deep Multilayer Perceptrons (MLP) to perform an ordinal classification of mobile marketing conversion rate (CVR), allowing to measure the value of product sales when an user clicks an ad. As a case study, we consider big data provided by a global mobile marketing company. Several experiments were held, considering a rolling window validation, different datasets, learning methods and performance measures. Overall, competitive results were achieved by an online deep learning model, which is capable of producing real-time predictions.This article is a result of the project NORTE-01-0247-FEDER-017497, supported
by Norte Portugal Regional Operational Programme (NORTE 2020), under the
PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). This work was also supported by Funda¸c˜ao para a
Ciˆencia e Tecnologia (FCT) within the Project Scope: UID/CEC/00319/201
- …